fix: route checkpoint operational logs to stderr by Simon-Free · Pull Request #47 · SafeRL-Lab/cheetahclaws

Simon-Free · 2026-04-17T19:22:39Z

Summary

Route the two [checkpoint] operational prints (large-file skip, backup-failure) to sys.stderr instead of stdout. On stdout they pollute the conversation transcript captured by session tooling; on stderr they stay in the operator's view where they belong.

Changes

File	+/-	What
`checkpoint/store.py`	+3/-2	`import sys` + `file=sys.stderr` on the two `print()` calls in `track_file_edit`
`tests/test_checkpoint_store.py`	+61	Three behavior tests via `capsys`: large file skipped (stderr, not stdout), normal file backed up, `shutil.copy2` failure routed to stderr
`tests/test_checkpoint_e2e.py`	+112	Two end-to-end scenarios driving `agent.run` with a mocked LLM that issues a `Write` tool_call; the Write hook, checkpoint store, and stderr routing all run for real

What was dropped

tests/test_checkpoint_extras.py is deleted. Three stale TestTokenSnapshotExtendedFields cases asserted cache_read / cache_creation fields removed in 620bbb2 per earlier reviewer feedback. The remaining two cases were trivial or file-source text scans; neither tests user behavior.

Scope limit

Only the stderr-routing change is in this PR. The earlier stderr_lines / cache_read / cache_creation snapshot fields mentioned in the previous description are not included: they need producer-side wiring (collecting per-tool stderr into the snapshot, tracking cache tokens from provider responses) which belongs in a follow-up once the plumbing exists.

Ref #43

chauncygu · 2026-04-18T07:23:22Z

Thanks for the contribution! I'd like to ask for this PR to be split into two, because the two changes have very different risk profiles.

A few issues first:

Description doesn't match the diff. The PR says it modifies session_store.py and adds optional stderr/tokens fields to add_entry(). The actual diff modifies checkpoint/store.py
and does something else (redirect existing prints to stderr + add two new keys to token_snapshot). Please update the description so reviewers can follow along.
cache_read / cache_creation are dead fields. The new code reads:
"cache_read": getattr(state, "total_cache_read_tokens", 0),
"cache_creation": getattr(state, "total_cache_creation_tokens", 0)
The tests assert on source strings, not behavior. test_cache_read_in_source / test_cache_creation_in_source just grep the source file for the literal "cache_read" / "cache_creation". These don't verify anything is actually captured and will lock in the implementation without catching real regressions.

Proposed split:

Keep the stderr change as its own small PR. Replacing print(...) with print(..., file=sys.stderr) for those [checkpoint] diagnostics is a clean, safe improvement — happy to merge, that quickly on its own.
Drop the token-field change from this PR. To land it properly, we first need to actually track cache tokens on State — i.e., accumulate cache_read_input_tokens / cache_creation_input_tokens from the API response in providers/ (and wherever the running totals live), then persist them in the checkpoint. Otherwise we're adding schema that carries no information. Please open a separate PR for that work once the producer side is in place, with tests that assert real values flow through (not string matches on source).

Regarding the failing CI on 3.10/3.11 — that's a pre-existing infra issue (sqlalchemy isn't in requirements.txt but tests/test_web_api.py imports it). It's unrelated to your changes and is being fixed separately. Once that fix is on main, please click Update branch on this PR (or rebase) so the new ci.yml takes effect.

Thanks again!

Simon-Free · 2026-04-18T07:42:59Z

I am very sorry for description not matching the diffs, I mistakenly made the PR on cheetahclaws instead of my fork, and thus marked as draft, in order to take the time to manually review, can I undraft them back and ping you whenever I finished to check it ? For both this one and the other

chauncygu · 2026-04-18T08:04:34Z

Sure, Thanks a lot for your great contributions.

The three TestTokenSnapshotExtendedFields cases asserted cache_read / cache_creation fields that were removed in 620bbb2 ("fix: remove dead cache_read/cache_creation fields per review"). They have been failing ever since. Delete test_checkpoint_extras.py -- its remaining cases were either trivial (test_store_imports_sys checks 'import sys' exists) or file-source text scans (TestCheckpointPrintsToStderr) which don't test user behavior. Add tests/test_checkpoint_e2e.py with two real e2e scenarios: - Drive agent.run with a mocked LLM that emits a Write tool_call; assert the checkpoint hook created a pre-edit backup of the original content. - Same path but the file exceeds _MAX_FILE_SIZE -- assert the skip message lands on stderr only, not stdout. This is the actual user-visible contract of PR SafeRL-Lab#47 and covers the full wiring agent.run -> Write hook -> checkpoint.store.track_file_edit. The three behavior tests in test_checkpoint_store.py stay -- they cover the store function directly via capsys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Lab#43

…rge file skip)

The three TestTokenSnapshotExtendedFields cases asserted cache_read / cache_creation fields that were removed in 620bbb2 ("fix: remove dead cache_read/cache_creation fields per review"). They have been failing ever since. Delete test_checkpoint_extras.py -- its remaining cases were either trivial (test_store_imports_sys checks 'import sys' exists) or file-source text scans (TestCheckpointPrintsToStderr) which don't test user behavior. Add tests/test_checkpoint_e2e.py with two real e2e scenarios: - Drive agent.run with a mocked LLM that emits a Write tool_call; assert the checkpoint hook created a pre-edit backup of the original content. - Same path but the file exceeds _MAX_FILE_SIZE -- assert the skip message lands on stderr only, not stdout. This is the actual user-visible contract of PR SafeRL-Lab#47 and covers the full wiring agent.run -> Write hook -> checkpoint.store.track_file_edit. The three behavior tests in test_checkpoint_store.py stay -- they cover the store function directly via capsys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Simon-Free · 2026-04-21T09:09:15Z

Hey @chauncygu this one looks better to me, but if you have any improvement ideas feel free to share ! Also, i tried to introduce some "e2e" tests (quotes, because we are still mocking LLM + some other things) that simulate the workflow and showcase the fix/feature in action !

… Python versions)

Simon-Free marked this pull request as draft April 17, 2026 19:44

chauncygu marked this pull request as ready for review April 17, 2026 22:20

Simon-Free mentioned this pull request Apr 18, 2026

feat: end-to-end cache token tracking + multi-provider coverage Simon-Free/cheetahclaws#1

Draft

Simon-Free changed the title ~~feat: add stderr capture and token fields to session checkpoints~~ fix: route checkpoint operational logs to stderr Apr 20, 2026

bot and others added 4 commits April 21, 2026 11:03

feat: capture stderr and token metadata in checkpointsnnRef SafeRL-…

164e6c1

…Lab#43

fix: remove test for non-existent distinct_base field

cbb1367

test: add integration tests for checkpoint store (stderr capture + la…

de52523

…rge file skip)

Simon-Free force-pushed the pr3-checkpoint-stderr-tokens branch from b9bf441 to 4bc4a09 Compare April 21, 2026 09:06

Simon FREYBURGER added 2 commits April 21, 2026 14:46

test: add shared conftest.py with scripted_stream and _no_quota fixtures

222b7e2

refactor: move scripted_stream to tests/helpers.py (importable on all…

7bdf2cc

… Python versions)

Simon-Free mentioned this pull request Apr 21, 2026

feat: tool_call_alias and depends_on for intra-turn tool scheduling #50

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: route checkpoint operational logs to stderr#47

fix: route checkpoint operational logs to stderr#47
Simon-Free wants to merge 6 commits intoSafeRL-Lab:mainfrom
Simon-Free:pr3-checkpoint-stderr-tokens

Simon-Free commented Apr 17, 2026 •

edited

Loading

Uh oh!

chauncygu commented Apr 18, 2026

Uh oh!

Simon-Free commented Apr 18, 2026 •

edited

Loading

Uh oh!

chauncygu commented Apr 18, 2026

Uh oh!

Simon-Free commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Simon-Free commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

What was dropped

Scope limit

Uh oh!

chauncygu commented Apr 18, 2026

Uh oh!

Simon-Free commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chauncygu commented Apr 18, 2026

Uh oh!

Simon-Free commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Simon-Free commented Apr 17, 2026 •

edited

Loading

Simon-Free commented Apr 18, 2026 •

edited

Loading